A new hybrid approach enabling MT for languages with little resources

نویسندگان

  • Peter Dirix
  • Vincent Vandeghinste
  • Ineke Schuurman
چکیده

In this paper, we combine techniques from rule-based and corpus-based MT in a hybrid approach. We only use a dictionary, basic analytical resources and a monolingual targetlanguage corpus in order to enable the construction of an MT system for lesser-resourced languages. Statistical and example-based systems usually do not involve a lot of linguistic notions. Cutting up sentences in linguistically sound subunits improves the quality of the translation. Demarcating clauses, verb groups, noun phrases, and prepositional phrases restricts the number of possible translations and hence also the search space. The sentence chunks are translated using a dictionary and a limited set of mapping rules. By bottom-up matching the different translated items and higher-level structure with the database information, one or more plausible translated sentences are constructed. A search engine ranks them using the frequencies of occurence and the matching accuracy in the target-language corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Handling Translation Divergences in Generation-Heavy Hybrid Machine Translation

This paper describes a novel approach for handling translation divergences in a Generation-Heavy Hybrid Machine Translation (GHMT) system. The approach depends on the existence of rich target language resources such as word lexical semantics, including information about categorial variations and subcate-gorization frames. These resources are used to generate multiple structural variations from ...

متن کامل

Proactive Learning for Building Machine Translation Systems for Minority Languages

Building machine translation (MT) for many minority languages in the world is a serious challenge. For many minor languages there is little machine readable text, few knowledgeable linguists, and little money available for MT development. For these reasons, it becomes very important for an MT system to make best use of its resources, both labeled and unlabeled, in building a quality system. In ...

متن کامل

Language-independent hybrid MT with PRESEMT

The present article provides a comprehensive review of the work carried out on developing PRESEMT, a hybrid language-independent machine translation (MT) methodology. This methodology has been designed to facilitate rapid creation of MT systems for unconstrained language pairs, setting the lowest possible requirements on specialised resources and tools. Given the limited availability of resourc...

متن کامل

A research perspective on how to democratize machine translation and translation aids aiming at high quality final output

Machine Translation (MT) systems and Translation Aids (TA) aiming at costeffective high quality final translation are not yet usable by small firms, departments and individuals, and handle only a few languages and language pairs. This is due to a variety of reasons, some of them not frequently mentioned. But commercial, technical and cultural reasons make it mandatory to find ways to democratiz...

متن کامل

Boosting Performance of Weak MT Engines Automatically: Using MT Output to Align Segments & Build Statistical Post-Editors

This paper addresses the practical challenge of improving existing, operational translation systems with relatively weak, black-box MT engines when higher quality MT engines are not available and only a limited quantity of online resources is available. Recent research results show impressive performance gains in translating between Indo-European languages when chaining mature, existing rulebas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006